Dynamic Workload for Schema Evolution in Data Warehouses: a Performance Issue

نویسنده

  • Fadila Bentayeb
چکیده

A data warehouse allows the integration of heterogeneous data sources for identified analysis purposes. The data warehouse schema is designed according to the available data sources and the users' analysis requirements. In order to provide an answer to new individual analysis needs, we previously proposed, in recent work, a solution for on-line analysis personalization. We based our solution on a user-driven approach for data warehouse schema evolution which consists in creating new hierarchy levels in OLAP (On-Line Analytical Processing) dimensions. One of the main objectives of OLAP, as the meaning of the acronym refers, is the performance during the analysis process. Since data warehouses contain a large volume of data, answering decision queries efficiently requires particular access methods. The main issue is to use redundant optimization structures such as views and indices. This implies to select an appropriate set of materialized views and indices, which minimizes total query response time, given a limited storage space. A judicious choice in this selection must be cost-driven and based on a workload which represents a set of users' queries on the data warehouse. In this chapter, we address the issues related to the workload’s evolution and maintenance in data warehouse systems in response to new requirements modeling resulting from users’ personalized analysis needs. The main issue is to avoid the workload generation from scratch. Hence, we propose a workload management system which helps the administrator to maintain and adapt dynamically the workload according to changes arising on the data warehouse schema. To achieve this maintenance, we propose two types of workload updates: (1) maintaining existing queries consistent with respect to the new data warehouse schema and (2) creating new queries based on the new dimension hierarchy levels. Our system helps the administrator in adopting a pro-active behaviour in the management of the data warehouse performance. In order to validate our workload management system, we address the implementation issues of our proposed prototype. This latter has been developed within client/server architecture with a web client interfaced with the Oracle 10g DataBase Management System.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Managing Source Schema Evolution in Web Warehouses

Web Data Warehouses have been introduced to enable the analysis of integrated Web data. One of the main challenges in these systems is to deal with the volatile and dynamic nature of Web sources. In this work we address the effects of adding/removing/changing Web sources and data items to the Data Warehouse (DW) schema. By managing source evolution we mean the automatic propagation of these cha...

متن کامل

Preface Chapter 2, " Dynamic Workload for Schema Evolution in Data Warehouses: a Performance Issue, "

Data warehousing and knowledge discovery are established key technologies in many application domains. Enterprises and organizations improve their abilities in data analysis, decision support, and the automatic extraction of knowledge from data; for scientific applications to analyze collected data, for medical applications for quality assurance and for steps to individualized medicine, to ment...

متن کامل

Repository Support for Data Warehouse Evolution

Data warehouses are complex systems consisting of many components which store highlyaggregated data for decision support. Due to the role of the data warehouses in the daily business work of an enterprise, the requirements for the design and the implementation are dynamic and subjective. Therefore, data warehouse design is a continuous process which has to reflect the changing environment of a ...

متن کامل

Storage Layout and I/O Performance in Data Warehouses

Defining data placement and allocation in the disk subsystem can have a significant impact on data warehouse performance. However, our experiences with data warehouse implementations show that the database storage layout is often subject to vague or even invalid assumptions about I/O performance trade-offs. Clear guidelines for the assignment of database objects to disks are a very common reque...

متن کامل

Dynamic Query Scheduling in Parallel Data Warehouses

Parallel processing is a key to high performance in very large data warehouse applications that execute complex analytical queries on huge amounts of data. Although parallel database systems (PDBSs) have been studied extensively in the past decades, the specifics of load balancing in parallel data warehouses have not been addressed in detail. In this study, we investigate how the load balancing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011